svm: Rationalise register synchronisation to be similar to our vmx
handling.
1. Do not copy all VMCB register state in cpu_user_regs on every
vmexit.
2. Save/restore RAX inside asm stub (in particular, before STGI on
vmexit).
3. Simplify store/load_cpu_guest_regs() hook functions to synchronise
precisely the same state as VMX.
By my measurements this reduces the round-trip latency for a null
hypercall by around 150 cycles. This is about 3% of the ~5000-cycle
total on my AMD X2 system. Not a great win, but a nice extra on top of
the code rationalisation.